Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract We propose a new approach to deriving quantitative mean field approximations for any probability measure $$P$$ on $$\mathbb {R}^{n}$$ with density proportional to $$e^{f(x)}$$, for $$f$$ strongly concave. We bound the mean field approximation for the log partition function $$\log \int e^{f(x)}dx$$ in terms of $$\sum _{i \neq j}\mathbb {E}_{Q^{*}}|\partial _{ij}f|^{2}$$, for a semi-explicit probability measure $$Q^{*}$$ characterized as the unique mean field optimizer, or equivalently as the minimizer of the relative entropy $$H(\cdot \,|\,P)$$ over product measures. This notably does not involve metric-entropy or gradient-complexity concepts which are common in prior work on nonlinear large deviations. Three implications are discussed, in the contexts of continuous Gibbs measures on large graphs, high-dimensional Bayesian linear regression, and the construction of decentralized near-optimizers in high-dimensional stochastic control problems. Our arguments are based primarily on functional inequalities and the notion of displacement convexity from optimal transport.more » « less
-
null (Ed.)AI for good (AI4G) projects involve developing and applying ar- tificial intelligence (AI) based solutions to further goals in areas such as sustainability, health, humanitarian aid, and social justice. Developing and deploying such solutions must be done in collab- oration with partners who are experts in the domain in question and who already have experience in making progress towards such goals. Based on our experiences, we detail the different aspects of this type of collaboration broken down into four high-level cat- egories: communication, data, modeling, and impact, and distill eleven takeaways to guide such projects in the future. We briefly describe two case studies to illustrate how some of these takeaways were applied in practice during our past collaborations.more » « less
-
Abstract MotivationSingle cell RNA-seq (scRNA-seq) data contains a wealth of information which has to be inferred computationally from the observed sequencing reads. As the ability to sequence more cells improves rapidly, existing computational tools suffer from three problems. (i) The decreased reads-per-cell implies a highly sparse sample of the true cellular transcriptome. (ii) Many tools simply cannot handle the size of the resulting datasets. (iii) Prior biological knowledge such as bulk RNA-seq information of certain cell types or qualitative marker information is not taken into account. Here we present UNCURL, a preprocessing framework based on non-negative matrix factorization for scRNA-seq data, that is able to handle varying sampling distributions, scales to very large cell numbers and can incorporate prior knowledge. ResultsWe find that preprocessing using UNCURL consistently improves performance of commonly used scRNA-seq tools for clustering, visualization and lineage estimation, both in the absence and presence of prior knowledge. Finally we demonstrate that UNCURL is extremely scalable and parallelizable, and runs faster than other methods on a scRNA-seq dataset containing 1.3 million cells. Availability and implementationSource code is available at https://github.com/yjzhang/uncurl_python. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
An official website of the United States government
